Designing an Improved Discriminative Word Aligner
نویسندگان
چکیده
The quality of statistical machine translation systems depends on the quality of the word alignments, computed during the translation model training phase. IBM generative alignment models, despite their poor quality compared to a gold standard, perform well in practice. In this paper, we propose an improved word aligner based on a maximum entropy alignment combination model, which employ better feature engineering, `1 regularization, and an enhanced search space to improve the quality of both alignment and translation. For the ArabicEnglish language pair, we are able to reduce the Alignment Error Rate by 43.4%, and achieve ≈ 1 BLEU point enhancement over the IBM model 4 symmetrized alignments. These improvement are attainable at a lower computational cost, using only easy to estimate HMM and IBM model 1 features. An analysis of the obtained results shows that a good balance between several alignment characteristics should be maintained in order to deliver good translation quality.
منابع مشابه
EMDC: A Semi-supervised Approach for Word Alignment
This paper proposes a novel semisupervised word alignment technique called EMDC that integrates discriminative and generative methods. A discriminative aligner is used to find high precision partial alignments that serve as constraints for a generative aligner which implements a constrained version of the EM algorithm. Experiments on small-size Chinese and Arabic tasks show consistent improveme...
متن کاملSoft Syntactic Constraints for Word Alignment through Discriminative Training
Word alignment methods can gain valuable guidance by ensuring that their alignments maintain cohesion with respect to the phrases specified by a monolingual dependency tree. However, this hard constraint can also rule out correct alignments, and its utility decreases as alignment models become more complex. We use a publicly available structured output SVM to create a max-margin syntactic align...
متن کاملA Lightweight and High Performance Monolingual Word Aligner
Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-o...
متن کاملImproving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques
Automatic lexical alignment is a vital step for empirical machine translation, and although good results can be obtained with existent models (e.g. Giza++), more precise alignment is still needed for successfully handling complex constructions such as multiword expressions. In this paper we propose an approach for lexical alignment combining statistical and linguistic information. We describe t...
متن کاملImproved Word Alignment with Statistics and Linguistic Heuristics
We present a method to align words in a bitext that combines elements of a traditional statistical approach with linguistic knowledge. We demonstrate this approach for Arabic-English, using an alignment lexicon produced by a statistical word aligner, as well as linguistic resources ranging from an English parser to heuristic alignment rules for function words. These linguistic heuristics have b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011